82 research outputs found

    Using distributional similarity to organise biomedical terminology

    Get PDF
    We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

    Wave Blocking Phenomena and Ecological Applications

    No full text
    The growing flow of people and goods around the globe has allowed new, non-native species to establish and spread in already fragile ecosystems. The introduction of invasive species can have a detrimental impact on the already established species. Thus, it is important that we understand the mechanisms that facilitate or prevent invasion. Since reaction-diffusion invasion models produce travelling waves we can study invasion by looking at the mechanisms that allow for wave propagation failure, or wave-blocking. In this thesis we consider a perturbed reaction-diffusion model in which the perturbation resides in either the reaction or diffusion term. In doing so we exploit the underlying symmetry of our problem to define a region in the appropriate parameter space that leads to wave blocking. As a demonstrative example we apply our theory to the bistable equation and consider the effects of various perturbations

    The Challenge of Technical Text

    No full text
    When evaluating and comparing Answer Extraction and Question Answering systems one can distinguish between scenarios for different information needs such as the "Fact Finding", the "Problem Solving", and the "Generic Information" scenarios. For each scenario, specific types of questions and specific types of texts have to be taken into account, each one causing specific problems. We argue that comparative evaluations of such systems should not be limited to a single type of information need and one specific text type. We use the example of technical manuals and a working Answer Extraction system, "ExtrAns", to show that other, and important, problems will be encountered in the other cases. We also argue that the quality of the individual answers could be determined automatically through the parameters of correctness and succinctness, i.e. measures for recall and precision on the level of unifying predicates, against a (hand-crafted) gold standard of "ideal answers"

    An automata based approach to biomedical named entity recognition

    No full text
    ing an automata learning algorithm: Causal-State Splitting Reconstruction [1]. This algorithm has previously been applied to Named Entity Recognition [2] obtaining good results given the simplicity of the approach. The same approach has been applied to Biomedical NE identification, using GENIA corpus 3.0, with 10-fold cross-validation. Our system attained F1 = 73.14%. These results can be compared directly to [3] and [4], which used the same data. First system obtains F1 = 57.4% using ME Models, and the second one reports F1 = 79.2% using SVMs. Both improve their results using post-processing techniques, reaching F1 = 76.9% and F1 = 79.9% respectively. Our system does not use any post-processing techniques, and takes into acount few features, so the results are considered very promising. In future work some post-processing will be developed to improve the results

    Question answering in terminology-rich technical domains

    No full text
    In this chapter we will first explore (in section 2) the peculiarities of technical documentation. The central role that terminology plays in technical domains (parallel to the role of named entities in open-domain question answering) will be explored in section 3. As an example of the practical application of QA in technical domains we then present (in section 4) a real-world system (ExtrAns), specifically designed for technical domains.12 page(s

    Anaphora resolution in ExtrAns

    No full text
    The true power of anaphora resolution algorithms can only be gauged when embedded into specific Natural Language Processing (NLP) applications. In this paper we describe the anaphora resolution module from ExtrAns, an answer extraction system. The anaphora resolution module is based on Lappin and Leass’ original algorithm, which used McCord’s Slot Grammar as the inherent parser. We report how to port Lappin and Leass’ algorithm to Link Grammar, a freely available dependency-based parsing system that is used in a range of NLP applications. Finally, we report on how the equivalence classes that result from the anaphora resolution algorithm are incorporated into the logical forms used by ExtrAns.8 page(s

    NLP for Answer Extraction in Technical Domains

    No full text
    In this paper we argue that question answering (QA) over technical domains is distinctly different from TREC-based QA or Web-based QA and it cannot benefit from data-intensive approaches

    A symbolic approach to automatic multiword term structuring

    No full text
    This paper presents a three-level structuring of multiword terms (MWTs) basing on lexical inclusion, WordNet similarity and a clustering approach. Term clustering by automatic data analysis methods offers an interesting way of organizing a domain’s knowledge structures, useful for several information-oriented tasks like science and technology watch, textmining, computer-assisted ontology population, Question Answering(Q-A). This paper explores how this three-level term structuring brings to light the knowledge structures from a corpus of genomics and compares the mapping of the domain topics against a hand-built ontology (the GENIA ontology). Ways of integrating the results into a Q-A system are discussed
    corecore